Detection of Laughter-in-Interaction in Multichannel Close-Talk Microphone Recordings of Meetings
نویسندگان
چکیده
Laughter is a key element of human-human interaction, occurring surprisingly frequently in multi-party conversation. In meetings, laughter accounts for almost 10% of vocalization effort by time, and is known to be relevant for topic segmentation and the automatic characterization of affect. We present a system for the detection of laughter, and its attribution to specific participants, which relies on simultaneously decoding the vocal activity of all participants given multi-channel recordings. The proposed framework allows us to disambiguate laughter and speech not only acoustically, but also by constraining the number of simultaneous speakers and the number of simultaneous laughers independently, since participants tend to take turns speaking but laugh together. We present experiments on 57 hours of meeting data, containing almost 11000 unique instances of laughter.
منابع مشابه
"But let me talk": An Investigation into Teachers' Interaction Patterns in EFL Classrooms
Drawing on Walsh's (2012) idea that boosting learners' contribution and interaction can play a key role in their foreign language learning, this mixed-methods study tried to cast some light on the ways by which teachers, via their choice and use of language, create or block learners' contribution in direct interactions in the classroom. A total of 800-minute recordings of 10 teachers' talks and...
متن کاملLaughter Detection in Meetings
We build a system to automatically detect laughter events in meetings, where laughter events are defined as points in the meeting where a number of the participants (more than just one) are laughing simultaneously. We implement our system using a support vector machine classifier trained on mel-frequency cepstral coefficients (MFCCs), delta MFCCs, modulation spectrum, and spatial cues from the ...
متن کاملA Supervised Factorial Acoustic Model for Simultaneous Multiparticipant Vocal Activity Detection in Close-talk Microphone Recordings of Meetings
Using automatic speech recognition (ASR) word error rates (WERs) as a metric, the systems in (1) and (3) appear to have yield similar performance, in spite of significant additional architectural differences. Systems of type (2) have not been fielded for segmentation for ASR, and therefore cannot be directly compared. Although approaches of type (3) offer a significant advantage, namely the opp...
متن کاملDemonstrating Laughter Detection in Natural Discourses
This work focuses on the demonstration of previously achieved results in the automatic detection of laughter from natural discourses. In the previous work features of two different modalities, namely audio and video from unobtrusive sources, were used to build a system of recurrent neural networks called Echo State networks to model the dynamics of laughter. This model was then again utilized t...
متن کاملScore-Informed Source Separation for Multichannel Orchestral Recordings
This paper proposes a system for score-informed audio source separation for multichannel orchestral recordings. The orchestral music repertoire relies on the existence of scores. Thus, a reliable separation requires a good alignment of the score with the audio of the performance. To that extent, automatic score alignment methods are reliable when allowing a tolerance window around the actual on...
متن کامل